Difference-based database upgrade

ABSTRACT

A method for updating a database that combines the advantages of both the migration and incremental approach may resolve some of the problems associated with earlier methods. This may be accomplished by a methodology which incrementally updates new data while using the migration approach to create an entirely new schema. Such a method might compare the metadata from both the existing database and the data update to generate a new schema and integrate new data into the database.

BACKGROUND

It is typically desirable that a database reflect the most current andaccurate information available. To maintain effectiveness, databasesoften require periodic data and metadata updates. Currently, migrationand incremental updates offer two methods for updating databases,however, both approaches exhibit limitations. Using the migrationapproach, a database manager must develop an entirely new schema as wellas write new structured query language (SQL) migration scripts to movenew data into the schema. If a database requires frequent updates, adatabase manager using the migration approach would be forced to designnew schema and scripts for every revision. As an alternative, a managercould incrementally update the database by modifying the schema and datausing SQL scripts alone. However, database schema are typically complexand often impractical to partially modify due to inherent datadependencies and other factors which can only be accurately duplicatedby designing an entirely new schema. Due to these intricacies, databasesupdated using either the migration approach or incremental revisionswill likely suffer inaccuracies over time.

SUMMARY

A method for updating a database that combines the advantages of boththe migration and incremental approach may resolve some of the problemsassociated with earlier methods. This may be accomplished by amethodology which incrementally updates new data while using themigration approach to create an entirely new schema. Such a method mightcompare the metadata from both the existing database and the data updateto generate a new schema and integrate new data into the database.

DRAWINGS

FIG. 1 is a block diagram of a computing system that may operate inaccordance with the claims;

FIG. 2 is a flowchart explaining an embodiment of a control process forupdating a database;

FIG. 3 is a diagram which describes the relationship between variouscomponents during a database update;

FIG. 4 is a flowchart explaining an embodiment of a control process forinstalling a new database; and

FIG. 5 is a diagram which describes the relationship between variouscomponents during a new database install.

DESCRIPTION

Although the following text sets forth a detailed description ofnumerous different embodiments, it should be understood that the legalscope of the description is defined by the words of the claims set forthat the end of this patent. The detailed description is to be construedas exemplary only and does not describe every possible embodiment sincedescribing every possible embodiment would be impractical, if notimpossible. Numerous alternative embodiments could be implemented, usingeither current technology or technology developed after the filing dateof this patent, which would still fall within the scope of the claims.

It should also be understood that, unless a term is expressly defined inthis patent using the sentence “As used herein, the term ‘______’ ishereby defined to mean . . . ” or a similar sentence, there is no intentto limit the meaning of that term, either expressly or by implication,beyond its plain or ordinary meaning, and such term should not beinterpreted to be limited in scope based on any statement made in anysection of this patent (other than the language of the claims). To theextent that any term recited in the claims at the end of this patent isreferred to in this patent in a manner consistent with a single meaning,that is done for sake of clarity only so as to not confuse the reader,and it is not intended that such claim term by limited, by implicationor otherwise, to that single meaning. Finally, unless a claim element isdefined by reciting the word “means” and a function without the recitalof any structure, it is not intended that the scope of any claim elementbe interpreted based on the application of 35 U.S.C. § 112, sixthparagraph.

FIG. 1 illustrates an example of a suitable computing system environment100 on which a system for the steps of the claimed method and apparatusmay be implemented. The computing system environment 100 is only oneexample of a suitable computing environment and is not intended tosuggest any limitation of the scope of use or functionality of theclaimed method or apparatus. Neither should the computing environment100 be interpreted as having any dependency or requirement relating toany one or combination of components illustrated in the exemplaryoperating environment 100.

The steps of the claimed method and apparatus are operational withnumerous other general purpose or special purpose computing systemenvironments or configurations. Examples of well known computingsystems, environments, and/or configurations that may be suitable foruse with the methods or apparatus of the claims include, but are notlimited to, personal computers, server computers, hand-held or laptopdevices, multiprocessor systems, microprocessor-based systems, set topboxes, programmable consumer electronics, network PCs, minicomputers,mainframe computers, distributed computing environments that include anyof the above systems or devices, and the like.

The steps of the claimed method and apparatus may be described in thegeneral context of computer-executable instructions, such as programmodules, being executed by a computer. Generally, program modulesinclude routines, programs, objects, components, data structures, etc.,that perform particular tasks or implement particular abstract datatypes. The methods and apparatus may also be practiced in distributedcomputing environments where tasks are performed by remote processingdevices that are linked through a communications network. In adistributed computing environment, program modules may be located inboth local and remote computer storage media including memory storagedevices.

With reference to FIG. 1, an exemplary system for implementing the stepsof the claimed method and apparatus includes a general purpose computingdevice in the form of a computer or PC 110. Components of computer 110may include, but are not limited to, a processing unit 120, a systemmemory 130, and a system bus 121 that couples various system componentsincluding the system memory to the processing unit 120. The system bus121 may be any of several types of bus structures including a memory busor memory controller, a peripheral bus, and a local bus using any of avariety of bus architectures. By way of example, and not limitation,such architectures include Industry Standard Architecture (ISA) bus,Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, VideoElectronics Standards Association (VESA) local bus, and PeripheralComponent Interconnect (PCI) bus also known as Mezzanine bus.

Computer 110 typically includes a variety of computer readable media.Computer readable media can be any available media that can be accessedby computer 110 and includes both volatile and nonvolatile media,removable and non-removable media. By way of example, and notlimitation, computer readable media may comprise computer storage mediaand communication media. Computer storage media includes both volatileand nonvolatile, removable and non-removable media implemented in anymethod or technology for storage of information such as computerreadable instructions, data structures, program modules or other data.Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical disk storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can accessed by computer 110. Communication media typicallyembodies computer readable instructions, data structures, programmodules or other data in a modulated data signal such as a carrier waveor other transport mechanism and includes any information deliverymedia. The term “modulated data signal” means a signal that has one ormore of its characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Combinations of any of the aboveshould also be included within the scope of computer readable media.

The system memory 130 includes computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) 131and random access memory (RAM) 132. A basic input/output system 133(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 110, such as during start-up, istypically stored in ROM 131. RAM 132 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 120. By way of example, and notlimitation, FIG. 1 illustrates operating system 134, applicationprograms 135, other program modules 136, and program data 137.

The computer 110 may also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,FIG. 1 illustrates a hard disk drive 140 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 151that reads from or writes to a removable, nonvolatile magnetic disk 152,and an optical disk drive 155 that reads from or writes to a removable,nonvolatile optical disk 156 such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment include,but are not limited to, magnetic tape cassettes, flash memory cards,digital versatile disks, digital video tape, solid state RAM, solidstate ROM, and the like. The hard disk drive 141 is typically connectedto the system bus 121 through a non-removable memory interface such asinterface 140, and magnetic disk drive 151 and optical disk drive 155are typically connected to the system bus 121 by a removable memoryinterface, such as interface 150.

The drives and their associated computer storage media discussed aboveand illustrated in FIG. 1, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 110. In FIG. 1, for example, hard disk drive 141 is illustratedas storing operating system 144, application programs 145, other programmodules 146, and program data 147. Note that these components can eitherbe the same as or different from operating system 134, applicationprograms 135, other program modules 136, and program data 137. Operatingsystem 144, application programs 145, other program modules 146, andprogram data 147 are given different numbers here to illustrate that, ata minimum, they are different copies. A user may enter commands andinformation into the computer 110 through input devices such as akeyboard 162 and pointing device 161, commonly referred to as a mouse,trackball or touch pad. Other input devices (not shown) may include amicrophone, joystick, game pad, satellite dish, scanner, or the like.These and other input devices are often connected to the processing unit120 through a user input interface 160 that is coupled to the systembus, but may be connected by other interface and bus structures, such asa parallel port, game port or a universal serial bus (USB). A monitor191 or other type of display device is also connected to the system bus121 via an interface, such as a video interface 190. In addition to themonitor, computers may also include other peripheral output devices suchas speakers 197 and printer 196, which may be connected through anoutput peripheral interface 190. Additionally, the printer 196 may beconnected via the network interface 170.

The computer 110 may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer180. The remote computer 180 may be a personal computer, a server, arouter, a network PC, a peer device or other common network node, andtypically includes many or all of the elements described above relativeto the computer 110, although only a memory storage device 181 has beenillustrated in FIG. 1. The logical connections depicted in FIG. 1include a local area network (LAN) 171 and a wide area network (WAN)173, but may also include other networks. Such networking environmentsare commonplace in offices, enterprise-wide computer networks, intranetsand the Internet.

When used in a LAN networking environment, the computer 110 is connectedto the LAN 171 through a network interface or adapter 170. When used ina WAN networking environment, the computer 110 typically includes amodem 172 or other means for establishing communications over the WAN173, such as the Internet. The modem 172, which may be internal orexternal, may be connected to the system bus 121 via the user inputinterface 160, or other appropriate mechanism. In a networkedenvironment, program modules depicted relative to the computer 110, orportions thereof, may be stored in the remote memory storage device. Byway of example, and not limitation, FIG. 1 illustrates remoteapplication programs 185 as residing on memory device 181. It will beappreciated that the network connections shown are exemplary and othermeans of establishing a communications link between the computers may beused.

Generally, and with reference to FIGS. 2 and 3, a database may beupdated by a process which combines the advantages of both the migrationmethod and incremental updates. Specifically, the process begins atblock 200 when a user or process may create a file describing thecurrent database schema 300. The file 300 may represent the currentdatabase metadata. At block 210, a process 310 may load the filedescribing the current database 300 into a cache 320. At block 220, theuser or a software process may create a file describing the nextdatabase release 330. The file 330 may be described in a commoninformation format such as XML or the like and may be written manuallyor generated automatically. Like the file describing the currentdatabase 300, the file describing the next database release 330 maycontain database metadata. At block 230, the process 310 may load thefile describing the next database release 330 into a cache 340. Eachcache 320 and 340 may now contain files which represent metadatacorresponding to the current database 300 and a next release 330,respectively. The caches 320 and 340 may be used as a common denominatorto compare the current database against the next release by abstractingthe files 300 and 330 into objects which may be manipulated by varioussoftware processes. To ensure that each file may be compared against theother, the schema of both the current database and the database updatemay be separately serialized into files in a common information formatsuch as XML, then further deserialized into objects stored in theirrespective caches 320 and 340. At block 230, the caches 320 and 340 maycontain these comparable abstractions of the files 300 and 330,respectively.

At block 240, the process 310 may compare the caches 320 and 340 todetermine differences between the next database release and the currentdatabase's entities, attributes, and relationships. At block 250, theprocess 310 may determine that changes to the database are needed. Ifchanges to the database are needed, at block 260 the process 310 maysubmit the database changes as a series of requests to a component 350.Based on the changes needed, at block 270 the component 350 may thenderive a script or sequence of changes to the database in a databasequery language such as SQL. At block 280, the process 310 may executethe sequence of changes in two phases, 360 and 370. During the firstphase 360, the process 310 may add or update new entities and attributesto the database. During the second phase 370, the process 310 may deletethe previous entities, attributes, relationships, and views, and add newdatabase relationships and views based on the updated database. Thedatabase update may be complete after execution of the sequence ofchanges at block 280.

Generally, and with reference to FIGS. 4 and 5, a new database may becreated by a process similar to that described in relation to FIGS. 2and 3. The process begins at block 400 when a user or process may createa file describing the new database schema 500. The file 500 may bedescribed in a common information format such as XML or the like and maybe written manually or generated automatically. Cache 510 may contain anempty or null database object 520. Both caches 510 and 520 may nowcontain files which may represent metadata corresponding to the newdatabase 500 and an empty or null database object 520, respectively. Thecaches 510 and 520 may be used as a common denominator to compare thenew database against an empty database by abstracting the files 520 and500 into objects which may be manipulated by various software processes.To ensure that the files may be compared against each other, the nulldatabase 520 and the schema of the new database 500 may be separatelyserialized into files in a common information format such as XML, thenfurther deserialized into objects stored in their respective caches 510and 520. At block 410, the caches 510 and 520 may contain thesecomparable abstractions of the files 520 and 500, respectively.

At block 420, a process 480 may compare the caches 510 and 520 todetermine differences between the new database entities, attributes, andrelationships and the empty database 520. At block 430, a process 530may submit the new database changes as a series of requests to acomponent 540. Based on the changes, at block 440 the component 540 maythen derive a script or sequence of instructions to the database in adatabase query language such as SQL. At block 450, the process 530 mayexecute the sequence of changes in two phases. 550 and 560. During thefirst phase 550, the process 530 may add new entities and attributes tothe empty or null database object 520. During the second phase 560, theprocess 530 may add new database relationships and views. The newdatabase install may be complete after execution of the sequence ofinstructions at block 450.

Although the forgoing text sets forth a detailed description of numerousdifferent embodiments, it should be understood that the scope of thepatent is defined by the words of the claims set forth at the end ofthis patent. The detailed description is to be construed as exemplaryonly and does not describe every possible embodiment because describingevery possible embodiment would be impractical, if not impossible.Numerous alternative embodiments could be implemented, using eithercurrent technology or technology developed after the filing date of thispatent, which would still fall within the scope of the claims.

Thus, many modifications and variations may be made in the techniquesand structures described and illustrated herein without departing fromthe spirit and scope of the present claims. Accordingly, it should beunderstood that the methods and apparatus described herein areillustrative only and are not limiting upon the scope of the claims.

1. A method of updating a database comprising: loading a first metadatafrom the database into a first cache; loading a second metadata from afile into a second cache; comparing the first cache metadata and thesecond cache metadata; creating a request based on differences betweenthe first cache and the second cache; deriving a script from therequest; and, executing the script.
 2. The method of claim 1, whereinthe step of loading a first metadata from the database into a firstcache comprises serializing the first metadata from the database to acommon information format and deserializing the common informationformatted metadata to a format compatible with both the first cache andthe second cache.
 3. The method of claim 1, wherein the step of loadinga second metadata from a file into a second cache comprises serializingthe second metadata from the file to a common information format anddeserializing the common information formatted metadata to a formatcompatible with both the first cache and the second cache.
 4. The methodof claim 1, further comprising deriving the second metadata from a nextrelease.
 5. The method of claim 1, wherein the first metadata and thesecond metadata comprise at least one of an entity, an attribute, and arelationship.
 6. The method of claim 5, wherein the step of deriving thescript from the request comprises a first phase and a second phase. 7.The method of claim 6, wherein the first phase comprises at least oneof: adding a database entity; updating a database entity; adding adatabase attribute; and, updating a database attribute.
 8. The method ofclaim 6, wherein the second phase comprises at least one of: deleting adatabase entity; deleting a database attribute; deleting a databaserelationship; deleting a database view; adding a database relationship;and, creating a database view.
 9. A method of updating a databasecomprising: serializing a first metadata from the database to a commoninformation format; deserializing the common information formatted firstmetadata to a first comparable object; loading the first comparableobject into a first cache; serializing a second metadata from a databaseupdate file to a common information format; deserializing the commoninformation formatted second metadata to a second comparable object;loading the second comparable object into a second cache; comparing thefirst comparable object and the second comparable object; creating arequest based on differences between the first comparable object and thesecond comparable object; deriving a script from the request; and,executing the script.
 10. The method of claim 9, wherein the firstmetadata and the second metadata comprise at least one of an entity, anattribute, and a relationship.
 11. The method of claim 10, wherein thestep of deriving the script from the request comprises a first phase anda second phase.
 12. The method of claim 11, wherein the first phasecomprises at least one of: adding a database entity; updating a databaseentity; adding a database attribute; and, updating a database attribute.13. The method of claim 11, wherein the second phase comprises at leastone of: deleting a database entity; deleting a database attribute;deleting a database relationship; deleting a database view; adding adatabase relationship; and, creating a database view.
 14. A computerreadable medium comprising computer executable instructions for updatinga database comprising computer executable instructions for: loading afirst metadata from the database into a first cache; loading a secondmetadata from a database next release file into a second cache;comparing the first cache metadata and the second cache metadata;creating a request based on differences between the first cache and thesecond cache; deriving a script from the request; and, executing thescript.
 15. The computer readable medium of claim 14, wherein the stepof loading a first metadata from a database into a first cache comprisesserializing the first metadata from the database to a common informationformat and deserializing the common information formatted metadata to aformat compatible with both the first cache and the second cache. 16.The computer readable medium of claim 14, wherein the step of loading asecond metadata from a file into a second cache comprises serializingthe second metadata from the file to a common information format anddeserializing the common information formatted metadata to a formatcompatible with both the first cache and the second cache.
 17. Thecomputer readable medium of claim 14, wherein the first metadata and thesecond metadata comprise at least one of an entity, an attribute, and arelationship.
 18. The computer readable medium of claim 17, wherein thestep of deriving the script from the request comprises a first phase anda second phase.
 19. The computer readable medium of claim 18, whereinthe first phase comprises at least one of: adding a database entity;updating a database entity; adding a database attribute; and, updating adatabase attribute.
 20. The computer readable medium of claim 18,wherein the second phase comprises at least one of: deleting a databaseentity; deleting a database attribute; deleting a database relationship;deleting a database view; adding a database relationship; and, creatinga database view.